Goto

Collaborating Authors

 gemini nano


CAG: Chunked Augmented Generation for Google Chrome's Built-in Gemini Nano

Surulimuthu, Vivek Vellaiyappan, Rao, Aditya Karnam Gururaj

arXiv.org Artificial Intelligence

Integrating Gemini Nano into Google Chrome marks a revolutionary shift in browser capabilities, transforming it from a simple content delivery platform into an intelligent processing environment. This native AI integration addresses several longstanding challenges: it eliminates external API dependencies, enhances privacy through local processing, and democratizes AI access by making these capabilities available to all Chrome users without additional software or API requirements. However, browser-based AI models face a significant constraint in their limited context window size, which restricts their ability to process larger inputs like extensive documents or codebases. This limitation emerges from the necessary balance between model capability and browser performance constraints, potentially hindering real-world applications requiring substantial data processing. To address this challenge, we introduce Chunked Augmented Generation (CAG), an architectural framework specifically designed for Chrome's Gemini Nano implementation.


Evidence of Cognitive Deficits andDevelopmental Advances in Generative AI: A Clock Drawing Test Analysis

Galatzer-Levy, Isaac R., McGiffin, Jed, Munday, David, Liu, Xin, Karmon, Danny, Labzovsky, Ilia, Moroshko, Rivka, Zait, Amir, McDuff, Daniel

arXiv.org Artificial Intelligence

Generative AI's rapid advancement sparks interest in its cognitive abilities, especially given its capacity for tasks like language understanding and code generation. This study explores how several recent GenAI models perform on the Clock Drawing Test (CDT), a neuropsychological assessment of visuospatial planning and organization. While models create clock-like drawings, they struggle with accurate time representation, showing deficits similar to mild-severe cognitive impairment (Wechsler, 2009). Errors include numerical sequencing issues, incorrect clock times, and irrelevant additions, despite accurate rendering of clock features. Only GPT 4 Turbo and Gemini Pro 1.5 produced the correct time, scoring like healthy individuals (4/4). A follow-up clock-reading test revealed only Sonnet 3.5 succeeded, suggesting drawing deficits stem from difficulty with numerical concepts. These findings may reflect weaknesses in visual-spatial understanding, working memory, or calculation, highlighting strengths in learned knowledge but weaknesses in reasoning. Comparing human and machine performance is crucial for understanding AI's cognitive capabilities and guiding development toward human-like cognitive functions.


The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Galatzer-Levy, Isaac R., Munday, David, McGiffin, Jed, Liu, Xin, Karmon, Danny, Labzovsky, Ilia, Moroshko, Rivka, Zait, Amir, McDuff, Daniel

arXiv.org Artificial Intelligence

There is increasing interest in tracking the capabilities of general intelligence foundation models. This study benchmarks leading large language models and vision language models against human performance on the Wechsler Adult Intelligence Scale (WAIS-IV), a comprehensive, population-normed assessment of underlying human cognition and intellectual abilities, with a focus on the domains of VerbalComprehension (VCI), Working Memory (WMI), and Perceptual Reasoning (PRI). Most models demonstrated exceptional capabilities in the storage, retrieval, and manipulation of tokens such as arbitrary sequences of letters and numbers, with performance on the Working Memory Index (WMI) greater or equal to the 99.5th percentile when compared to human population normative ability. Performance on the Verbal Comprehension Index (VCI) which measures retrieval of acquired information, and linguistic understanding about the meaning of words and their relationships to each other, also demonstrated consistent performance at or above the 98th percentile. Despite these broad strengths, we observed consistently poor performance on the Perceptual Reasoning Index (PRI; range 0.1-10th percentile) from multimodal models indicating profound inability to interpret and reason on visual information. Smaller and older model versions consistently performed worse, indicating that training data, parameter count and advances in tuning are resulting in significant advances in cognitive ability.


Google reverses course and brings its Gemini AI to the regular Pixel 8

Engadget

Google will bring Gemini, the company's new large language model, to Pixel 8 smartphones after all. The phone will incorporate Gemini Nano, a version of the model built to run locally on personal devices. This follows a successful rollout to the Pixel 8 Pro late last year and the Samsung Galaxy S24 in January. The Pixel 8 features the same proprietary Tensor G3 chip as the Pro, which was designed to speed up AI performance. So the overall experience should be similar with both gadgets.


Google's Gemini AI takes aim at OpenAI and Microsoft's GPT-4

PCWorld

Last week, Google launched its new AI, or rather its new big language model, dubbed Gemini. The Gemini 1.0 model is available in three versions: Gemini Nano is supposed to be best suited for tasks on a specific device, Gemini Pro is supposed to be the best option for a wider range of tasks, and Gemini Ultra is Google's largest language model that will handle the most complex tasks you can give it. Something that Google was keen to highlight at the launch of Gemini Ultra was that the language model outperformed the latest version of OpenAI's GPT-4 in 30 of the 32 most commonly used tests to measure the capabilities of language models. The tests cover everything from reading comprehension and various math questions to writing code for Python and image analysis. In some of the tests, the difference between the two AI models was only a few tenths of a percentage point, while in others it was up to ten percentage points.


Google DeepMind Unveils Its Most Powerful AI Offering Yet

TIME - Tech

Google DeepMind has announced its much-anticipated family of artificial intelligence chatbots, Gemini, which will compete with OpenAI's GPT series. According to Google, Gemini Ultra, its largest and most capable new model, outperforms OpenAI's most capable model, GPT-4, at a number of text-based, image-based, coding, and reasoning tasks. Gemini Ultra will be available through a new AI chat feature called Bard Advanced from early next year, the company said. It is currently being refined and is undergoing "trust and safety checks, including red-teaming by trusted external parties," according to the announcement. Google DeepMind also announced the launch of Gemini Pro, which is now available to the public through Google's Bard chat interface, and the smaller Gemini Nano, which will run on Google's Pixel 8 Pro smartphone.


Google's Gemini AI is coming to Android

Engadget

Google is bringing Gemini, the new large language model it just introduced, to Android, beginning with the Pixel 8 Pro. The company's flagship smartphone will run Gemini Nano, a version of the model built specifically to run locally on smaller devices, Google announced in a blog post. The Pixel 8 Pro is powered by the Google Tensor G3 chip designed to speed up AI performance. This lets the Pixel 8 Pro add several smarts to existing features. The phone's Recorder app, for instance, has a Summarize feature that currently needs a network connection to give you a summary of recorded conversations, interviews, and presentations.